Conversational spontaneous speech synthesis using average voice model

نویسندگان

  • Tomoki Koriyama
  • Takashi Nose
  • Takao Kobayashi
چکیده

This paper describes conversational spontaneous speech synthesis based on hidden Markov model (HMM). To reduce the amount of data required for model training, we utilize an average-voice-based speech synthesis framework, which has been shown to be effective for synthesizing speech with arbitrary speaker’s voice using a small amount of training data. We examine several kinds of average voice model using readingstyle speech and/or conversation-style speech. We also examine an appropriate utterance unit for conversational speech synthesis. Experimental results show that the proposed two-stage model adaptation method improves the quality of synthetic conversational speech.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Prediction and Realisation of Conversational Characteristics by Utilising Spontaneous Speech for Unit Selection

Unit selection speech synthesis has reached high levels of naturalness and intelligibility for neutral read aloud speech. However, synthetic speech generated using neutral read aloud data lacks all the attitude, intention and spontaneity associated with everyday conversations. Unit selection is heavily data dependent and thus in order to simulate human conversational speech, or create synthetic...

متن کامل

Utilising spontaneous conversational speech in HMM-based speech synthesis

Spontaneous conversational speech has many characteristics that are currently not well modelled in unit selection and HMM-based speech synthesis. But in order to build synthetic voices more suitable for interaction we need data that exhibits more conversational characteristics than the generally used read aloud sentences. In this paper we will show how carefully selected utterances from a spont...

متن کامل

Accounting for Voice-Quality Variation

This paper proposes a two-layer model of the information carried in the speech signal. It attempts to define the role of prosody with a wider scope than has previously been considered in speech synthesis or linguistic research, by taking into account affective information in addition to that of linguistic content. The work is based on analysis of a large corpus of spontaneous conversational spe...

متن کامل

Evaluating expressive speech synthesis from audiobooks in conversational phrases

CNGL, School of Computer Science and Informatics, University College Dublin Dublin, Ireland {eva.szekely|mohamed.abou-zleikha}@ucdconnect.ie, {joao.cabral|peter.cahill|julie.berndsen}@ucd.ie Abstract Audiobooks are a rich resource of large quantities of natural sounding, highly expressive speech. In our previous research we have shown that it is possible to detect different expressive voice sty...

متن کامل

Developments in Corpus-Based Speech Synthesis: Approaching Natural Conversational Speech

This paper describes the special demands of conversational speech in the context of corpus-based speech synthesis. The author proposed the CHATR system of prosody-based unit-selection for concatenative waveform synthesis seven years ago, and now extends this work to incorporate the results of an analysis of five-years of recordings of spontaneous conversational speeech in a wide range of actual...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010